Relational Text Mining and Visualization
نویسنده
چکیده
Discovering hidden patterns in distributed heterogeneous textual databases and unstructured data is a new challenge in data mining. Traditional data mining often assumes that preprocessing is already done -homogeneous data are available on the needed level. For distributed heterogeneous textual data this is not the case. Complex relations between items/entities (e.g., relations between people in a fraud detection task) should be discovered and generalized. This paper offers a new hierarchical relational clustering method (Φ-method) based on the Φ-equivalence concept. The method permits to process: (i) incomplete relations: (ii) relations without converting them to attributes of individual entities, and (iii) relations presented in distributed heterogeneous databases using XML tags. Clustering produced by the method is invariant, has a clear meaning and natural visualization.
منابع مشابه
Concept Chain Graphs: A Hybrid IR Framework for Biomedical Text Mining
The area of biomedical text mining has seen much research activity due to the increased volume of literature that must be examined. Researchers need to validate and interpret their experimental results; this entails scouring through a massive amount of potentially relevant literature for clues that may shed light on their findings. An ideal situation would allow a user to interactively search t...
متن کاملDesign and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملVisualization of Text Streams: A Survey
This work presents related areas of research, types of data collections that are visualized, technical aspects of generating visualizations, and evaluation methodologies. Existing methods are structured and explained from the aspect of visualization process. Successful applications are noted and some future trends in the field are anticipated. Keywords— Information Visualization, Visual Analyti...
متن کاملA Wordification Approach to Relational Data Mining: Early Results
This paper describes a propositionalization technique called wordification. Wordification is inspired by text mining and can be seen as a transformation of a relational database into a corpus of documents. As in previous propositionalization methods, after the wordification step any propositional data mining algorithm can be applied. The most notable advantage of the presented technique is grea...
متن کاملFeaturelens: Interactive Visualization of Text Patterns
Finding interesting text patterns in large text collections has been studied in the information retrieval and text mining research communities. However, using interactive visualizations to guide insights, and help the forming of new hypothesis about the text is an area, in which little work has been done. We propose FeatureLens, a system for visualizing collections of text documents based on te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002